This take-home exercise aims to perform spatial point patterns analysis of Airbnb listings in Singapore.
Singapore is one of the few global cities that has yet to legalise short-term rentals offered by platforms such as Airbnb. However, it is interesting to note that there are data sets for Singapore available in Inside Airbnb - an independent, non-commercial set of tools and data that allows the exploration of how Airbnb is being used in cities around the world.
As such, we will like to use these data sets to analyse the following:
The data sets used for this analysis include:
The packages used for this analysis include:
packages = c('sf', 'spatstat', 'raster', 'maptools', 'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
st_read() of sf package is used to import the geospatial data, which is in shapefile format.
mrt_sf <- st_read(dsn="data/geospatial",
layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-09-25-take-home-exercise-2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
sg_sf <- st_read(dsn="data/geospatial",
layer="CostalOutline")
Reading layer `CostalOutline' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-09-25-take-home-exercise-2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 60 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
Projected CRS: SVY21
mpsz_sf <- st_read(dsn="data/geospatial",
layer="MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-09-25-take-home-exercise-2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
From the output message, we can see that:
mrt_sf sf data frame.sg_sf sf data frame.mpsz_sf sf data frame.Before we visualise the geospatial data, we will need to conduct data preprocessing to ensure that there are no invalid geometries and missing values.
[1] 0
[1] 1
[1] 9
There are no invalid geometries in the mrt_sf data frame while the sg_sf data frame and mpsz_sf data frame contains 1 and 9 invalid geometries respectively. We will now proceed to remove the invalid geometries in the sg_sf and mpsz_sf data frames.
sg_sf <- st_make_valid(sg_sf)
mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(sg_sf) == FALSE))
[1] 0
[1] 0
From the output message, we can observe that there are no longer any invalid geometries in the sg_sf and mpsz_sf data frames!
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
We can see that there are no missing values in all 3 sf data frames.
We will first need to retrieve the coordinate reference system for verification. st_crs() of sf package is used to do this.
st_crs(mrt_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(sg_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
From the output messages, we can observe that the EPSG code for all 3 data frames is currently 9001. This is wrong because the EPSG code of projection coordinate system SVY21 is supposed to be 3414, instead of 9001. st_set_crs() of sf package is therefore used to assign the correct EPSG code for the data frames.
mrt_sf <- st_set_crs(mrt_sf, 3414)
sg_sf <- st_set_crs(sg_sf, 3414)
mpsz_sf <- st_set_crs(mpsz_sf, 3414)
We will now use st_crs() of sf package to retrieve the coordinate reference system again.
st_crs(mrt_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(sg_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The EPSG code is now correctly assigned for all 3 sf data frames!!
It is useful to plot a map to visualise the geospatial data, so that we can easily get a preliminary look at the spatial patterns.
tmap_mode('view')
tm_shape(mrt_sf) +
tm_dots()
Since the aspatial datasets are all in CSV format, read_csv() of readr package is used to import them. The outputs are tibble data frames.
It is also important to understand the data that we are working with. glimpse() of dplyr package is therefore used to perform exploratory data analysis.
airbnb2019 <- read_csv("data/aspatial/listings_300619.csv")
airbnb2021 <- read_csv("data/aspatial/listings_290621.csv")
hotels <- read_csv("data/aspatial/hotels.csv")
tourism <- read_csv("data/aspatial/tourism.csv")
glimpse(airbnb2019)
Rows: 8,293
Columns: 16
$ id <dbl> 49091, 50646, 56334, 71609, 7~
$ name <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id <dbl> 266763, 227796, 266763, 36704~
$ host_name <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group <chr> "North Region", "Central Regi~
$ neighbourhood <chr> "Woodlands", "Bukit Timah", "~
$ latitude <dbl> 1.44255, 1.33235, 1.44246, 1.~
$ longitude <dbl> 103.7958, 103.7852, 103.7967,~
$ room_type <chr> "Private room", "Private room~
$ price <dbl> 81, 80, 68, 200, 92, 102, 203~
$ minimum_nights <dbl> 180, 90, 6, 1, 1, 1, 1, 7, 30~
$ number_of_reviews <dbl> 1, 18, 20, 12, 20, 35, 23, 15~
$ last_review <date> 2013-10-21, 2014-12-26, 2015~
$ reviews_per_month <dbl> 0.01, 0.28, 0.21, 0.13, 0.21,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 9, 9, 9, 9, 1, 4, 4,~
$ availability_365 <dbl> 365, 365, 365, 353, 353, 348,~
glimpse(airbnb2021)
Rows: 4,238
Columns: 16
$ id <dbl> 49091, 50646, 56334, 71609, 7~
$ name <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id <dbl> 266763, 227796, 266763, 36704~
$ host_name <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group <chr> "North Region", "Central Regi~
$ neighbourhood <chr> "Woodlands", "Bukit Timah", "~
$ latitude <dbl> 1.44129, 1.33432, 1.44041, 1.~
$ longitude <dbl> 103.7957, 103.7852, 103.7967,~
$ room_type <chr> "Private room", "Private room~
$ price <dbl> 81, 80, 67, 177, 81, 81, 52, ~
$ minimum_nights <dbl> 180, 90, 6, 90, 90, 90, 14, 1~
$ number_of_reviews <dbl> 1, 18, 20, 20, 24, 48, 20, 13~
$ last_review <date> 2013-10-21, 2014-07-08, 2015~
$ reviews_per_month <dbl> 0.01, 0.22, 0.16, 0.29, 0.34,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 4, 4, 4, 50, 50, 7, ~
$ availability_365 <dbl> 365, 365, 365, 365, 365, 365,~
glimpse(hotels)
Rows: 422
Columns: 9
$ NAME <chr> "Jayleen Clarke Quay Hotel", "JEN Singapor~
$ ADDRESSPOSTALCODE <dbl> 59390, 238858, 249716, 399041, 238485, 399~
$ ADDRESSSTREETNAME <chr> "25 New Bridge Road", "277 Orchard Road , ~
$ HYPERLINK <chr> "jayleenclarkequay@gmail.com", "singaporeo~
$ TOTALROOMS <dbl> 20, 499, 565, 42, 81, 33, 17, 634, 56, 451~
$ KEEPERNAME <chr> "James Federick Chong Kah Yean", "Kuok Oon~
$ Lat <dbl> 1.288715, 1.300510, 1.304271, 1.311586, 1.~
$ Lng <dbl> 103.8475, 103.8392, 103.8239, 103.8775, 10~
$ ICON_NAME <chr> "hotel.gif", "hotel.gif", "hotel.gif", "ho~
glimpse(tourism)
Rows: 107
Columns: 17
$ NAME <chr> "Chinatown Heritage Centre, Singapore", "T~
$ DESCRIPTION <chr> "Experience how Singapore’s early Chinese ~
$ ADDRESSSTREETNAME <chr> "48 Pagoda Street", "158 Telok Ayer Street~
$ HYPERLINK <chr> "http://www.singaporechinatown.com.sg/", "~
$ PHOTOURL <chr> "www.yoursingapore.com/content/dam/desktop~
$ URL_PATH <chr> "www.yoursingapore.com/en/see-do-singapore~
$ IMAGE_ALT_TEXT <chr> "Learn more about local Chinese culture at~
$ PHOTOCREDITS <chr> "Joel Chua DY", "Joel Chua DY", "Joel Chua~
$ LASTMODIFIED <dttm> 2015-11-02 02:16:52, 2015-11-02 02:20:57,~
$ LATITUDE <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ LONGTITUDE <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ META_DESCRIPTION <chr> "At the Chinatown Heritage Centre, experie~
$ OPENING_HOURS <chr> "Daily, 9am – 8pm,Last entry at 7pm.*China~
$ Lat <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ Lng <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ ICON_NAME <chr> "tourist_spot.gif", "tourist_spot.gif", "t~
$ ADDRESSPOSTALCODE <dbl> NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,~
From the output messages, we can observe that all 4 data frames have the columns Latitude and Longitude, and the values are in decimal degrees. This implies that they are using the WGS84 referencing system. It is also worthy to note that these 2 columns are in numeric data types.
Before proceeding further we will need to conduct data preprocessing to ensure that there are no missing values in the columns Latitude and Longitude for all 4 data frames.
We are only looking at these 2 columns because the next step covered in Section 5.2 does not allow for missing values. In addition, the respective datasets will be dropped as part of geospatial data wrangling later in Section 6.0. As such, we do not need to care about NA values outside of the 2 above-mentioned columns.
airbnb2019[is.na(airbnb2019$latitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
# host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
# latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>,
# last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>
airbnb2019[is.na(airbnb2019$longitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
# host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
# latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>,
# last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>
airbnb2021[is.na(airbnb2021$latitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
# host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
# latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>,
# last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>
airbnb2021[is.na(airbnb2021$longitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
# host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
# latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>,
# last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>
hotels[is.na(hotels$Lat)!=0,]
# A tibble: 0 x 9
# ... with 9 variables: NAME <chr>, ADDRESSPOSTALCODE <dbl>,
# ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, TOTALROOMS <dbl>,
# KEEPERNAME <chr>, Lat <dbl>, Lng <dbl>, ICON_NAME <chr>
hotels[is.na(hotels$Lng)!=0,]
# A tibble: 0 x 9
# ... with 9 variables: NAME <chr>, ADDRESSPOSTALCODE <dbl>,
# ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, TOTALROOMS <dbl>,
# KEEPERNAME <chr>, Lat <dbl>, Lng <dbl>, ICON_NAME <chr>
tourism[is.na(tourism$LONGTITUDE)!=0,]
# A tibble: 1 x 17
NAME DESCRIPTION ADDRESSSTREETNAME HYPERLINK PHOTOURL URL_PATH
<chr> <chr> <chr> <chr> <chr> <chr>
1 Cruis~ Watch the Si~ <NA> <NA> www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
# PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
# LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
# Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
tourism[is.na(tourism$LATITUDE)!=0,]
# A tibble: 1 x 17
NAME DESCRIPTION ADDRESSSTREETNAME HYPERLINK PHOTOURL URL_PATH
<chr> <chr> <chr> <chr> <chr> <chr>
1 Cruis~ Watch the Si~ <NA> <NA> www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
# PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
# LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
# Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
We can see that there is a missing value in the Latitude and Longitude columns for the tourism data frame and it belongs to the observation “Cruise from Singapore”. Although “Cruise from Singapore” is considered a tourist attraction, it does not belong to a specific location. We will therefore need to remove it.
tourism <- tourism[!is.na(tourism$LATITUDE),]
We will now use the earlier code chunk to check if the missing value in the tourism data frame is removed.
tourism[is.na(tourism$LONGTITUDE)!=0,]
# A tibble: 0 x 17
# ... with 17 variables: NAME <chr>, DESCRIPTION <chr>,
# ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, PHOTOURL <chr>,
# URL_PATH <chr>, IMAGE_ALT_TEXT <chr>, PHOTOCREDITS <chr>,
# LASTMODIFIED <dttm>, LATITUDE <dbl>, LONGTITUDE <dbl>,
# META_DESCRIPTION <chr>, OPENING_HOURS <chr>, Lat <dbl>,
# Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
tourism[is.na(tourism$LATITUDE)!=0,]
# A tibble: 0 x 17
# ... with 17 variables: NAME <chr>, DESCRIPTION <chr>,
# ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, PHOTOURL <chr>,
# URL_PATH <chr>, IMAGE_ALT_TEXT <chr>, PHOTOCREDITS <chr>,
# LASTMODIFIED <dttm>, LATITUDE <dbl>, LONGTITUDE <dbl>,
# META_DESCRIPTION <chr>, OPENING_HOURS <chr>, Lat <dbl>,
# Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
There are no missing values remaining in the tourism data frame!!
Since these datasets will be used for spatial point pattern analysis later, they will need to be converted to sf data frames to assist further geospatial data wrangling in Section 6.0.
st_as_sf() of sf package is used to convert the airbnb2019, airbnb2021, hotels, tourism data frames into sf data frames.
As discovered in the exploratory data analysis performed earlier, the data frames are using the WGS84 referencing system. We will thereby need to assign EPSG code of 4326 before transforming the newly created sf data frames into SVY21 projected coordinated system using st_transform() of sf package.
airbnb2019_sf <- st_as_sf(airbnb2019,
coords = c("longitude", "latitude"),
crs=4326) %>%
st_transform(crs=3414)
airbnb2021_sf <- st_as_sf(airbnb2021,
coords = c("longitude", "latitude"),
crs=4326) %>%
st_transform(crs=3414)
hotels_sf <- st_as_sf(hotels,
coords = c("Lng", "Lat"),
crs=4326) %>%
st_transform(crs=3414)
tourism_sf <- st_as_sf(tourism,
coords = c("LONGTITUDE", "LATITUDE"),
crs=4326) %>%
st_transform(crs=3414)
Similarly, we will want to plot a map to visualise the converted geospatial data to easily get a preliminary look at the spatial patterns.
tmap_mode('view')
tm_shape(airbnb2019_sf) +
tm_dots(alpha=0.4,
col="blue",
size= 0.02) +
tm_shape(hotels_sf) +
tm_dots(alpha=0.4,
col="red",
size= 0.02) +
tm_shape(tourism_sf) +
tm_dots(alpha=0.4,
col="green",
size= 0.02)